Replication Data Archieve for "Effects of Increased Transparency on Political Divides and MP Behavior: Evidence from Televised Question Hours in the Finnish Parliament" by Nieminen, Jeremias; Simola, Salla; Tukiainen, Janne

Parts of some codes are recycled from an earlier paper by Simola, Nieminen & Tukiainen: "A century of partisanship in Finnish parliamentary speech" (that paper not yet published as of September 2023)

We first analyze the data using R and Python in a computing cluster (see the folder /cluster/analysis_dates/), while the final plots and tables (based on the estimation results from /cluster/ folder and additional estimations) are made in Stata. 

See the .doc file "Guidance_for_main_analyses.docx" for more information abour main analyses.
There are additional, short README files for both the main estimation folder (see /cluster/ folder) and for the final scripts that produce figures and tables (see /final_estimation/ folder)

Most subfolders, please unzip them before running analyses.

Below you can find a brief description of where you can find specific types of scripts:



### Estimating polarization series, difference-in-difference estimates, and confidence intervals

* See /cluster/analysis_dates folder
* Datasets needed for these scripts are stored in subfolder located in folders /cluster/analysis_dates/ and /cluster/build_dates
* Some of these calculations may require a substantial amount of computing power, i.e. a computing cluster




### Producing the final figures and tables in the paper from (can be done locally without a computing cluster):

* See /final_estimation/ folder
* Here we simply do two things:
1. plot polarization estimates that were calculated using a computing cluster and saved to datasets, 
2. estimate regressions where the outcomes are other than polarization (e.g. speech length, attendance, etc.)




#### Re-building datasets from raw text data

# speech datasets from texts and re-estimating polarization from speech data using a computing cluster:
* See /cluster/build_dates folder
* These calculations require a substantial amount of computing power, i.e. a computing cluster
* The folder is named "build_dates" as we build otherwise the same data as in Simola et al. (2023), but add dates (and use only question hours)
* Please note: All the datasets needed for scripts in /analysis_dates/ folder are stored in either /build_dates/ or /analysis_dates/ folders. To produce the exact results of our paper, please use the speech data files provided in these folders. In addition, we also provide scripts that may be used to create the aforementioned datasets from raw text files containing speeches, but in that process (i.e., in codes located in folder /cluster/build_dates) an undeterministic package "LANGDETECT" is used (to filter out Swedish phrases). The conclusions of our paper should, of course, hold even if the data files are created again using the codes stored in this cluster/build_dates/ folder
* Some of these scripts may require a substantial amount of computing power, i.e. a computing cluster

# build datasets used in other estimations:
* descriptive statistics data: see /cluster/descriptive_media
* sentencetransformers: see /cluster/sentencetransformers
* topic counts: see /cluster/topic_counts
* Some of these scripts may require a substantial amount of computing power, i.e. a computing cluster



